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Any unitary operation in quantum information processing can be implemented via a sequence of simpler 
steps - quantum gates. However, actual implementation of a quantum gate is always imperfect and takes a finite 
time. Therefore, seeking for a short sequence of gates - efficient quantum circuit for a given operation, is an 
important task. We contribute to this issue by proposing optimization of the well-known universal procedure 
proposed by Barenco et.al [1]. We also created a computer program which realizes both Barenco's decomposition 
and the proposed optimization. Furthermore, our optimization can be applied to any quantum circuit containing 
generalized Toffoli gates, including basic quantum gate circuits. 

1 Introduction 

Practical realization of quantum information processing requires ability to prepare a quantum system in a chosen 
state, to perform the desired operation on it and to read out the outcome via a measurement. These tasks can be 
carried out on a collection of two-level quantum systems - qubits. Since it is very difficult (practically impossible) 
to properly control simultaneous interaction among many qubits, the desired operation is usually performed 
as a sequence of simpler operations - quantum gates. The gate is accomplished by a temporal evolution of a 
system during which only few qubits interact simultaneously. The complicated operation is then built up by a 
sequence of quantum gates - quantum circuit containing experimentally feasible gates. In the quantum circuit 
model [2] the system of qubits is described as a closed quantum system. Therefore the time evolution is unitary, 
and each quantum gate is a unitary operator. 

A set of (experimentally realizable) gates is called quantum gate library. In the rest of the paper we 
work with the basic-gate library [1], which contains all one qubit rotations and the ControUcd-NOT (CNOT) 
gate. This library is universal in the sense that any unitary operation can be exactly achieved by a quantum 
circuit containing only finite number of gates from the basic-gate library. This universality was shown in 1995 
constructively by Barenco et. al. [1]. Since that time, much effort has been made to propose a universal 
technique for finding an efficient quantum circuit for a general unitary operation. Many research groups focused 
on searching for an universal n-qubit circuit containing the lowest possible number of CNOT gates (see e.g. 
[3] , [4] , [5] ) . That means a circuit capable to achieve any unitary operator by tuning the circuit's one-qubit gates. 
The number of CNOT gates is important both, for its relation to the execution time of the circuit, and also 
from the point of view of complexity of its experimental implementation. 

Shende, Markov and Bullock in [6] showed by dimension-counting arguments that universal n-qubit circuits 
have to contain at least |;(4" — 3n — 1) CNOT gates. Although universal circuits with the lowest number of 
CNOTs are known for the special case of 2 qubits (see Refs. [6], [7], [8]), for higher number of qubits it remains 
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an open problem. Furthermore, we have no guarantee that the tuning of single qubit gates corresponding to 
an operator realizable efficiently will lead to a straightforward simplification of the universal circuit. Therefore, 
universal n-qubit circuits often contain exponential number of CNOT gates (with respect to n) also for n-qubit 
operations realizable by a polynomial number of CNOTs. Intricacy of the simplification (optimization) of 
universal circuits for a chosen operator can be seen in the case of two qubit operators where the circuits with 
the lowest number of CNOT gates for a given operator were found by other means. On the other hand, for 
more than two qubits, the universal circuits are the only standard approach to find a quantum circuit for an 
arbitrary given operator. 

In the present paper we show a simplification of the universal n-qubit circuit proposed by Barenco et.al. [1]. 
We tried to find, for an arbitrary given unitary operator, a quantum circuit containing the lowest possible number 
of CNOT gates. Barenco's decomposition utilizes the decomposition of unitary matrix into multiplication of a 
diagonal matrix and two-level matrices. After this decomposition one obtains a preliminary quantum circuit 
containing generalized Toffoli gates. These gates will be finally implemented by other constructions using basic 
quantum gates. 

We examined some natural questions concerning generalized Toffoli gates and we propose an optimiza- 
tion algorithm which combines the found properties. This optimization algorithm can be applied to any 
quantum circuit containing generalized Toffoli gates including circuits containing basic gates. Hence, we 
can utilize our optimization in several stages of Barenco's decomposition. To perform Barenco's decompo- 
sition and the proposed optimization, we created a computer program (the program can be downloaded from 
www.quniverse.sk/people/sedlak/), which was used to estimate the efficiency of the optimization. 

The rest of the paper is organized as follows. We start with a definition of the generalized Toffoli gate, 
which is followed by a brief sketch of Barenco's et. al. decomposition. More details about the procedure can 
be found in Refs. [1] and [10]. In section 3, we examine the properties of generalized Toffoli gates which are 
combined to create an optimization algorithm in section 4. The results obtained by the optimization algorithm 
are summarized in the section 5. 

2 Preliminaries 

2.1 Definition of Generalized Toffoli gate A^(A) 

The generalized Toffoli gate Ami A) is an (m + 1) -qubit gate with m control qubits ji, ■ ■ ■ ^jmi and one target 
qubit jo- The action of the gate on computational basis vectors reads: 

\xi,...,Xj„,...,Xn) ^\XU---)® A^^i^-^^^- \Xj„) ®\...,Xn), (1) 

where A is an operator (2x2 matrix) acting on one qubit. Thus the target qubit of computational-basis vector 
is affected by the operator A only if all control qubits are in the state 

2.2 Brief sketch of the Barenco's decomposition 

The procedure can be divided into four steps: 
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Figure 1: Examples from parts of A. Barenco et. al. decomposition. 



• Step 1 - QR decomposition. Matrix of the chosen operator U is written as: 

U = D -12,1 -2^3,1 -^3,2 ••■ ^2",2"-2-^2",2"-2j (2) 

where I? is a diagonal phase-matrix and Tpq are matrices acting nontrivially only in two-dimensional 
subspaces, which are given by pairs of computational-basis vectors. 

• Step 2 - Decomposition of matrices Tpq and D into generalized Toffoli gates. General matrix Tpg 
operates on one pair of distinct computational-basis vectors. However, generalized Toffoli gate A„_i(j4) 
changes one pair of computational-basis vectors which differ only in one qubit. Therefore, we first use 
An-i{<Jx) gates and NOT gates to perform a permutation which takes the pair of computational-basis 
vectors given by Tpq to vectors differing in only one qubit. Then we apply the appropriate A„_i(j4) gate 
(together with some NOT gates) and finally undo the permutation. This allows to implement every matrix 
Tpq. To build the entries of diagonal matrix D, we use An-i{diag{., .)) gates surrounded by pairs of NOT 
gates. As an example, in Figure [Ha one can see the decomposition of matrix Tg^i operating between 
vectors |000), |111} . 

• Step 3 - Simplification of generalized Toffoli gates. Gates An-i{A) are implemented by Controlled 
1-qubit gates (less complicated generalized Toffoli gates Ai(V^)). As an example. Figure [T]b shows the 
simplification of the ^^{U) gate. 

• Step 4 - Decomposition of Ai(y) gates into basic quantum gates. This step finishes the decom- 
position by using only basic quantum gates in the circuit. The example in Figure [T]c shows the worst 
case decomposition of the Ai(F) gate into four 1-qubit gates and two CNOT gates. For some Ai(F) fewer 
gates suffice. 

3 Properties of generalized Toffoli gates 

By renaming qubits, each pair of generalized Toffoli gates can be drawn and denoted as shown in Figure [2l 

Since every linear operator is fully defined through it's action on basis vectors, we will show equality of two 
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Commuting of pair of gates 
means: 

C=B D=A 

Order excliange of two gates 
with modification of one gate 
means: 

C=B D^A or C^B D=A 

t1 - target qubit of b1 gate 
t2 - target qubit of b2 gate 
t3 - control qubits of both 

b1 and b2 gate 
t4 - control qubits of b1 

gate only 
t5 - control qubits of b2 

gate only 
to - qubits not belonging 

to gates b1 and b2 



b1 b2 b2 b1 



Figure 2: Examining properties of generalized TofFoli gates 



operators via their action on computational-basis vectors. In what follows, the action of the gates bl,b2 is 
always described with respect to the computational basis. 

3.1 Commutativity of pair of gates 

Qubits of the type tO, t3, t4 and t5 (see definition in Figure ^ do not affect commutativity of the gates 
bl — AkiA) and 62 = A/(i?), because they control the action on qubits of the type tl, t2 in the same way in 
both orderings. From this property it follows that it sufhces to examine the cases Ml - M5. In each of these 
cases the action of the gates on computational-basis vectors in both orderings yields four equations, which give 
constraints on entries of the 2x2 matrices A and B. In the case Ml it is obvious that the gates bl and b2 
commute if matrices A and B commute. In the case M2 the gates surely always commute. In the case M3 gates 
bl and b2 commute if the matrix B is diagonal. In the case M4 the gates commute if the matrix A is diagonal. 
Thus we see that in the cases M3, M4 the gates bl and b2 commute if diagonal matrix B (resp. A) is passing 
through the control qubit of neighbouring gate. The situation in the case M5 is a bit different, and finally it 
turns out that there are three ways how to fulfill the aforementioned equations: i) Both matrices A and B are 
diagonal, ii) A = diag{e^", 1) and no constraint on B, iii) B = diag{e^^ , 1) and no constraint on A. 

3.2 Exchange of two gates with modification of one gate 

We consider only modification of one-qubit operator(2x2 matrix) from the definition of the generalized TofFoli 
gate ([T]). We can consider the gates as not commuting, because otherwise it follows from the unitarity that 
neither of the gates can be modified. Let's look at exchange of gates bl = Afe(A), 62 = A/(i?) for gates 
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b2n = A;(C), 61 = Ak{A). The previous argument tells us that B ^ C. If the gate bl had qubits t4, then there 
would exist a computational-basis vector with at least one qubit t4 in the state |0) and all qubits tl, t3, t5 
in state |1) which would reveal the difference between B and C, i.e. difference between the gates b2 and b2n. 
Thus if we want this exchange to be possible, the gate bl must not involve qubits t4. Gates can have the other 
types of qubits, because they do not enable separate action of the gate b2, and b2n. Similarly, as in the case 
of the commutativity of gates, it remains to examine the cases Ml, M3 ~ M5 and to solve similar equations. 
The results of these technical calculations are presented in Table [1] For the completeness, the conditions for 
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Table 1: Exchange of gates bl = Afe(A), 62 = Ai{B) for gates 61 = A;(C), 62 = AkiA), which is possible only if 
there are no qubits t4 (see Figure [2]). 



exchange of gates 61 = Afe(yl), 62 = Ai{B) for gates 62 = Ai{B), bin = Ak{D) (completely analogous to the 
previous one) are summarized in Tabled 
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Table 2: Exchange of gates 61 = Ak{A), 62 = Ai{B) for gates 61 — Ai{B), 62 = Ak{D), which is possible only if 
there are no qubits t5(see Figure [2]). 
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3.3 Conditions for merging two gates into one 

First of all, we will examine the circumstances under which two generalized Toffoli gates form an identity. This 
will enable us to formulate conditions of merging two generalized Toffoli gates into one. We consider the gates 
bl — Ak{A) and b2 — Ai{B), both different from the identity. The gates bl and b2 must not involve the qubits 
t4 and t5, because they involve action of cither the gate bl or b2 on some subspace of the Hilbert space, where 
we see their difference from the identity. It suffices to examine the cases Ml - M5 (see Figure [5]), because the 
gates bl and b2 act nontrivially only on computational-basis vectors with all qubits of the type t3 in the state 
|1) and do not modify the qubits other than tl and t2. So in each case we write down the transformation 
carried out by the gates bl, b2 and require it to be the identity. The resulting constraints on elements of the 
matrices A and B are presented in Table [H Obviously, if we have two neighbouring generalized Toffoli gates 



Case 


Constraints on A, B 


Ml 


A.B = 1 


M2 


A ^ l.e''^,B = l.e-'v 


M3 


A = diag{e-"f, e"*^), B = diag{l, e"^) 


M4 


A = diag{l, e'^), B = diag{e-'^, e''^) 


M5 


A ^ diag{l, e'^), B = diag{l, e'"^) 



Table 3: Two generalized Toffoli gates bl — Ak{A) and 62 = A;(B) form the identity if they don't have qubits 
of the type t4, t5 (see Figure [2]) and fulfill these constraints. 

which form identity, we remove them from the circuit. Also, if we have two such neighbouring gates 61 = Ak{A) 
and 62 = Ai{B), which do not form the identity only because either of the matrices A or B does not fulfill the 
constraints from Table [3l it is possible to simplify the circuit. It suffices to suitably divide the gate bl or b2 as 
shown in Figure [3^) and to remove the pair of gates forming the identity. This finally leads us to merging the 
two gates into one. 

3.4 Exchange of two gates with help of one additional gate 

During the Barenco's decomposition we work with circuits, where NOT gates act on control qubit of a neigbour- 
ing generalized Toffoli gate Am {crx ) ■ These pairs of gates do not commute and their order cannot be exchanged 
even when we modify one of them. This is the reason why we generalized the well known CNOT identity shown 
in Figure [3lb. Our generalization is depicted in Figure [3lc, where matrices A, B are not arbitrary, but restricted 
as we will describe below. In the case there are no qubits t4, the requirement of equality of transformations 
performed by circuits from Figure [He only tells us that the matrix A must have vanishing diagonal elements. 
But if qubits t4 are present, then we must fulfill also the condition B = B\ because only gates b2, and b2n act 
on computational-basis vectors with at least one qubit t4 in the state |0}. 
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Figure 3: Circuit identities. The one from c) holds if matrix A has vanishing diagonal elements. 



4 Optimization algorithm 

The main idea of our approach to an optimization of quantum circuits containing generalized Toffoli gates is 
based on using the properties of Afe(.) gates described above. We drag the selected gate to the left until we 
merge it with a neighbouring gate or we are not able to drag it further. After that, we try the same to the right 
and then select another gate, and repeat the whole procedure. We do these steps, until the number of gates in 
the circuit is decreasing. We perform the dragging as follows; If gates commute we exchange them, otherwise 
we try to exchange them while modifying one of them. If it is not possible to use any of these options, we use 
exchange of two gates with the help of one additional gate, but only if there are no qubits t4 (see Figure Oc). In 
addition, we use it only once before we succeed in merging the gates. This guarantees that the number of gates 
in the circuit will not grow. In case we use an exchange of two gates with the help of one additional gate, the 
number of gates in the circuit will be the same, but one generalized Toffoli gate will have less control qubits, 
which leads to less basic gates in subsequent steps of Barenco's decomposition. 



5 Results 

Due to the fact that Barenco's et. al. decomposition [1] leads to quantum circuits containing hundreds of 

gates even in the case of 3-qubit operations, it is not possible to perform the decomposition and the proposed 

optimization by hand. This reason stimulated us to create a computer program which realizes both Barenco's 

decomposition and optimization algorithm proposed in this paper. We can show analytically that, in the worst 

case of a 2-qubit operation, our optimization improves Barenco's decomposition to obtain circuit containing 

10 CNOTs instead of 20 CNOTs. However, it is known that three CNOTs suffice to implement any 2-qubit 
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Decomposition 


Number of qubits 


A. Barenco's 


A. Barenco's 


NQ 


CS 


n 




+ Our optimization 






2 


20 


10 


3 


4 


3 


576 


379 


21 


26 


4 


8000 


6278 


105 


118 


5 


91520 


76208 


465 


494 



Table 4: The Number of CNOT gates in the circuits produced by various decompositions for the generic unitary 
operator. 

operation, therefore we see that the proposed optimization is only a partial one. We examined the efficiency 
of our optimization for different subsets of operators acting on various numbers of qubits numerically. Our 
approach starts by randomly generating the operator with known upper bound on number of CNOTs needed 
for its implementation. This is done by computing the operation corresponding to a circuit build from that 
number of CNOTs and randomly picked 1-qubit gates. Then we decompose this operator by Barenco's et. al. 
decomposition with/ without our optimization. Wc did this several times and evaluated the results. One would 
expect that the number of CNOTs in the circuit created by Barenco's decomposition with our optimization will 
strongly depend on the operator we are decomposing. We have found out that, except for the operators generated 
by circuits containing artificially chosen 1-qubit gates, each decomposition with the proposed optimization leads 
to a circuit with exactly the same number of CNOT gates (for the chosen number of qubits). This is caused 
by a redundancy in the blocks of generalized Toffoli gates, which our optimization is not able to remove in 
generic cases. For a more quantitative overview see Table |4l This table also shows comparison with the NQ 
[4] and the CS [5] decompositions, which are the best performing universal decompositions in the worst case of 
unitary operators. The asymptotic number of CNOT gates used by Barenco's decomposition to implement any 
n-qubit unitary is 0{n^4"). Our numerical investigation suggests that the asymptotics may be the same also 
with using the proposed optimization. The NQ and CS decompositions are more efficient in general (creating 
roughly 1/2 x 4" CNOT gates in the worst case of a unitary operator), because their procedure contains steps 
which systematically remove a part of the redundancy introduced in previous steps. 

6 Summary 

Finding an efficient quantum circuit for a given n-qubit operation is an important task in the quantum circuit 
model of computation. Few universal procedures performing this task were proposed, but their efficiency (the 
number of created CNOT gates) is very often known only for the worst case of n-qubit unitary operators. 
However, it is believed that interesting operations might require only polynomial number of CNOT gates with 
respect to the number of qubits. Hence, it is very important to know the performance of such universal 
procedures on this kind of operators. Therefore, we proposed an optimization of the universal procedure by 
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Barenco et. al., and examined it's efficiency on the operators realizable by a small number of CNOT gates. To 
perform this task we created a computer program performing Barenco's procedure together with the proposed 
optimization. The results show that this procedure is in general not as efBcient as the NQ and CS decompositions 
and still leaves some redundancy in the created circuit. On the other hand, our optimization is not restricted 
to be used only with Barenco's decompositon and can be aplied on any quantum circuit containing generalized 
Toffoli gates which include circuits containing basic quantum gates. This can be useful once we have some 
basic gate circuit corresponding to unitary operator we are decomposing. Our optimization can also be useful 
in situations when the quantum algorithm is given as a sequence of efficient sub-circuits performing the sub- 
tasks. A very similar idea to our optimization was proposed and extended to a slightly more general framework 
by D. Maslov, G. W. Dueck and D.M. Miller in [11]. It's not possible to correctly compare their numerical 
results to ours, since they work with different gate library containing one-qubit gates, controUed-NOT gate, 
and controUed-sqrt-of-NOT gate. But roughly we can say that the portion of the gates removed in the particular 
examples they present is very similar to the portion of gates our optimization removes in the case of Barenco's 
decomposition. All published optimizations of quantum circuits are based on exchanging sequences of gates 
for shorter ones doing precisely the same thing. To propose a better optimization strategies it seems that we 
probably need to understand more deeply what is computed in the considered part of the circuit. 
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